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(54) Titie: CONVOLUTIVE BLIND SOURCE SEPARATION USING A MULTIPLE DECX>RRELATION METHOD 
(57) Abstract 



A method and apparatus that performs blind 
source separation usign convolutivc signal decor- 
relation. More specifically, the method accumu- 
lates a leng^ of input signal (mixed signal) that 
comprises a plurality of independent signals from 
independent signal sources. The invention then 
divides the length of input signal into a plural- 
ity of T-length periods (windows) and performs 
a discrete Fourier transform (DFT) on the signal 
whithin each T-length period. Thereafter, esti- 
mated cross-correlation values are computed us- 
ing a plurality of the average DFT values. A 
total number of K cross-correlation values are 
computed, where each of the K values is av- 
eraged over N of the T-length periods. Using 
the cross-correlation values, a gradient descent 
process computes the coefficients of a FIR fil- 
ter that will effectively separate the source sig- 
nals within the input signal. To achieve an accu- 
rate solution, the gradient descent process is con- 
strained in that the time-domain values of the fil- 
ter coefficents can attain only certain values, i.e., 
the time-domain filter coefficient values W(r) are 
constrained within each T-length period to be 
zero for any time t > Q. In this manner, a unique 
solution for the FIR filter coefficients is computed 
and a filter produced using these coefficents will 
effectively separate the source signals. 
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CONVOLUnVE BLIND SOURCE SEPARATION 
USING A MULTIPLE DECORRELATION METHOD 

This application claims the benefit of U.S. Provisional Application No. 
5 60/081,101 filed April 8, 1998, which is herein incorporated by reference. 

The invention relates to signal processing and, more particiilarly, the 
invention relates to a method and apparatus for performing signal separation 
using a multiple decorrelation technique. 

10 BACKGROUND OF THE INVENTION 

A growing ntimber of researchers have recently puhhshed techniques that 
perform bUnd source separation (BSS), i.e., separating a composite signal into 
its constituent component signals without a priori knowledge of those signals, 

15 These techniques find tise in various apphcations such as speech detection using 
multiple microphones, crosstalk removal in mvdtichannel commimications, 
mvdtipath channel identification and equaUzation, direction of arrival (DOA) 
estimation in sensor arrays, improvement of beam forming microphones for 
audio and passive sonar, and discovery of independent sotu-ce signals in various 

20 biological signals, such as EEG, MEG and the like. Many of the BSS techniques 
reqidre (or assume) a statistical dependence between the component signals to 
accurately separate the signals. Additional theoretical progress in signal 
modeling has generated new techniques that address the problem of identifying 
statistically independent signals - a problem that lies at the heart of source 

25 separation. 

The basic source separation problem is simply described by assuming 
statistically independent sources s(r) = [5, {t\© {t)f that have been convolved 
and mixed in a linear medixxm leading to sensor signals x(f) = [x^{tX©,x^^ {t)f 
that may include additional sensor noise n(i). The convolved, noisy signal is 
30 represented in the time domain by the following equation (known as a forward 
model): 
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(1) 



Soxirce separation techniques are used to identify the djd^ coefficients of the 
channels A and to \altiinately determine an estimate s {t) for the unknown 
source signals. 



response (FIR) inverse model represented by the following equation (known as 
the backward model): 



In this representation, a BSS technique must estimate the FIR inverse 
10 components W such that the model source signals u(r) = [m, (/),©, (01^ are 

statistically independent. 

An approach to performing source separation under the statistical 
independence condition has been discussed in Weinstein et al., "Multi-Channel 
Signal Separation by Decorrelation", IEEE Transaction on Speech and Audio 

15 Processing, vol. 1, no. 4, pp. 405-413, 1993, where, for non-stationary signals, a 
set of second order conditions are specified that uniquely determine the 
parameters A in the forward model. However, no specific algorithm for 
performing source separation based on non-stationarity is given in the 
Weinstein et al. paper. 

20 Early work in the signal processing commimity had suggested 

decorrelating the measured signals, i.e., diagonalizing measured correlations for 
multiple time delays. For an instantaneous mix, also referred to as the constant 
gain case, it has been shown that for non-white signal decorrelation using 
multiple filter taps is sufficient to recover the source signals. However, for 

25 convolutive mixtures of wide-band signals, this technique does not produce a 
unique solution and, in fact, may generate soiu^ce estimates that are 
decorrelated but not statistically independent. As clearly identified by 
Weinstein et al, in the paper cited above, additional conditions are required to 
achieve a unique solution of statistically independent sources. In order to find 

30 statistically independent source signals, it is necessary to capture more than 
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Alternatively, the convolved signal may be filtered using a finite impulse 
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second order statistics, since statistical independence requires that not only 
second but all higher cross moments vanish. 

In the convolutive case, YelHn and Weinstein in "Multichannel Signal 
Separation: Methods and Analysis'", IEEE Transaction on Signal Processing; vol. 
5 44, no. 1, pp. 106-118, January 1996 estabUshed conditions on higher order 
multi-tap cross moments that allow convolutive cross talk removal. Although 
the optimization criteria extends naturally to higher dimensions, previous 
research has concentrated on a two dimensional case because a mxilti-chaimel 
FIR model (see equation 2) can be inverted with a properly chosen architecture 
10 using estimated forward filters. Heretofore, for higher dimensions, finding a 
stable approximation of the forward model has been illusive. 

These prior art techniques generally operate satisfactorily in computer 
simulations but perform poorly for real signals, e.g., audio signals. One could 
speculate that the signal densities of the real signals may not have the 
15 hypothesized structures, the higher order statistics may lead to estimation ^ 
instabiUties, or a violation of the signal stationarity condition may cause 
inaccurate solutions. 

Therefore, there is a need in the art for a bUnd soxirce separation 
technique that accurately performs convolutive signal decorrelation. 

20 

SUMMARY OF THE INVENTION 
The disadvantages of the prior art are overcome by a method and 
apparatus that performs bhnd source separation using convolutive signal 
decorrelation by simxiltaneously diagonaUzing second order statistics at multiple 
25 time periods. More specifically, the invention accumulates a length (segment) of 
input signal that comprises a rdixture of independent signal sources. The 
invention then divides the length of input signal into a plurahty of T-length 
periods (windows) and performs a discrete Fourier transform (DFT) on the 
mixed signal over each T-length period. Thereafter, the invention computes K 
30 cross-correlation power spectra that are each averaged over N of the T-length 
periods. Using the cross-correlation power values, a gradient descent process 
computes the coefficients of a FIR filter that will effectively separate the source 

3 
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signals within the input signal by simultaneoiisly decorrelating the K cross- 
correlation power spectra. To achieve an accurate solution, the gradient descent 
process is constrained in that the time-domain values of the filter coefficients 
can attain only certain values, i.e., the time-domain filter coefficient values W{t) 
5 are constrained within the T-length period to be zero for any time t>Q«T. In 
this manner, the so-called "permutation problem" is solved and a unique 
solution for the FIR filter coefficients is computed such that a filter produced 
using these coefficients will efifectively separate the source signals. 

Generally, the invention is implemented as a software routine that is 

10 stored in a storage medium and executed on a general purpose computer 
system. However, a hardware implementation is readily apparent fi:'om the 
following detailed description. 

The present invention finds application in a voice recognition system as a 
signal preprocessor system for decorrelating signals fi*om different soxn-ces such 

15 that a voice recognition processor can utihze the various voice signals that are 
separated by the invention. In response to the voice signals, the voice 
recognition processor can then produce computer commands or computer text. 



20 BRIEF DESCRIPTION OF THE DRAWINGS 

The teachings of the present invention can be readily tmderstood by 
considering the following detailed description in conjvmction with the 
accompan3dng drawings, in which: 

FIG. 1 depicts a system for executing a software implementation of the 
25 present invention; 

FIG. 2 is a flow diagram of a method of the present invention; 
FIG. 3 depicts a fi-equency domain graph of the filter coefficients 
generated by the present invention; and 

FIG. 4 depicts a time domain graph of the filter coefficients generated by 
30 the present invention. 
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To facilitate understanding, identical reference numerals have been used, 
where possible, to designate identical elements that are common to the figures. 



DETAILED DESCRIPTION 

5 

The present invention estimates values of parameters W for the backward 
model of equation 2 by assuming non-stationary source signals and using a least 
squares (LS) optimization to estimate W as well as signal and noise powers. The 
invention transforms the source separation problem into the frequency domain 

10 and solves simvdtaneously a source separation problem for every frequency. 

FIG* 1 depicts a system 100 for implementing the source separation 
method of the present invention. The system 100 comprises a convolved signal 
source 126 that suppUes the signal that is to be separated into its component 
signals and a computer system 108 that executes the miiltiple decorrelation 

15 routine 124 of the present invention. The source 126 may contain any soxirce of 
convolved signals, but is illustratively shown to contain a sensor array 102, a 
signal processor 104 and a recorded signal source 106. The sensor array 
contains one or more transducers 102A, 102B, 102C such as microphones. The 
transducers are coupled to a signal processor 104 that performs signal 

20 digitization. A digital signal is coupled to the computer system 108 for signal 
separation and further processing. A recorded signal source 106 may optionally 
form a source of the convolutive signals that require separation. 

The computer system 108 comprises a central processing unit (CPU) 114, 
a memory 122, support circmts 116, and an input/output (I/O) interface 120. 

25 The computer system 108 is generally coupled through the I/O interface 120 to 
a display 112 and varioTis input devices 110 such as a mouse and keyboard. The 
support circuits generally contain well-known circuits such as cache, power 
suppUes, clock circuits, a commxinications bus, and the like. The memory 122 
may include random access memory (RAM), read only memory (ROM), disk 

30 drive, tape drive, and the like, or some combination of memory devices. The 
invention is implemented as the multiple decorrelation routine 124 that is 
stored in memory 122 and executed by the CPU 114 to process the signal from 

5 
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the signal source 126. As such, the computer system 108 is a general pxirpose 
computer system that becomes a specific purpose computer system when 
executing the routine 124 of the present invention. Although a general purpose 
computer system is illustratively shown as a platform for implementing the 
5 invention, those skilled in the art will realize that the invention can also be 
implemented in hardware as an appUcation specific integrated circuit (ASIC), a 
digital signal processing (DSP) integrated circuit, or other hardware device or 
devices. As such, the invention may be implemented in sofi;ware, hardware, or a 
combination of software and hardware. 

10 The illustrative computer system 108 fiirther contedns speech recognition 

processor 118, e.g., a speech recognition circuit card or a speech recognition 
software, that is used to process the component signals that the invention 
extracts firom the convolutive signal. As such, a conference room having a 
plurality of people speaking and backgroimd noise can be monitored with 

15 multiple microphones 102. The microphones 102 produce a composite speech 
signal that requires separation into component signals if a speech recognition 
system is to be used to convert each person's speech into computer text or into 
computer commands. The composite speech signal is filtered, ampUfied and 
digitized by the signal processor 104 and coupled to the computer system 108. 

20 The CPU 114, executing the multiple decorrelation routine 124, separates the 
composite signal into its constituent signal components. From these constituent 
components, background noise can easily be removed. The constituent 
components without noise are then coupled to the speech recognition processor 
118 to process the component signals into computer text or computer 

,25 commands. In this manner, the computer system 108 while executing the 
multiple decorrelation routine 124 is performing signal pre-processing or 
conditioning for the speech recognition processor 118. 

FIG. 2 depicts a flow diagram of the multiple decorrelation routine 124 of 
the present invention. At step 200, the convolutive (mixed) signal is input, the 

30 signal is parsed into a plurality of windows containing T-samples of the input 
signal X(f), and the routine produces a discrete Fourier transform (DFT) values 
for each window ;if(r), i.e., one DFT value for each window of length T samples. 

6 
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At step 202, the routine 124 uses the DFT values to accumulate K cross- 
correlation power spectra, where each of the K spectra are averaged over N 
windows of length T samples. 

For non-stationary signals, the cross correlation estimates will be 
5 dependent on the absolute time and will indeed vary from one estimation 

segment (an NT period) to the next. The cross correlation estimates computed 
in step 204 are represented as: 

^x(^ V) = — S + (t + nT^v) (5) 

where: 

X( t + nT, v) = FFT X(r + nT ) 
•® X(0 = U(0-x(/ + r-l)] 



and x(v) is the FFT of the input signal within a window containing T samples. 
As such, the routine, at step 204, computes a matrix for each time t and for each 
frequency i; and then smns all the matrix components with each other matrix 
15 component. Steps 206, 208, 210 and 212 iterate the correlation estimation of 
step 204 over n=0 to N and k=0 to K to produce the K spectra. 

Equation 5 can then be simplified to a matrix representation: 

^^(f,v)=A(v)A,(r,v)A''(v)+A„(r,v) (6) 
If N is sufficiently large, A J(.t,v) and A J,t,v) can be modeled as diagonal 
20 matrices due to the signal independence assumption. For Eqxaation 6 to be 

hnearly independent for different times, it will be necessary that A changes 
over time, i.e., the signals are non-stationary. 

Using the cross correlation estimates of eqxiation 6, the invention 
computes the som-ce signals using cross-power-spectra satisfying the following 
25 equation: 

A^(r,v) = W(v)(^,(/,v)-A„(r,v))W''(v) (7) 
In order to obtain independent conditions for every time period, the time 
periods are generally chosen to have non-overlapping estimation times for 
R^it^yvX i.e., = kTN. But if the signals vary sufficiently fast, overlapping 
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estimation times may be utilized. Furthermore, although the windows T are 
generally sequential, the windows may overlap one another such that each DFT 
value is derived from signal information that is also contained in the previous 
window. In an audio signal processing system, the specific value of T is selected 
5 based upon room acoustics of the room in which the signals are recorded. For 
example, a large room with many reverberation paths requires a long window T 
such that the invention can process a substantial amount of signal information 
to achieve source signal separation. The value of N is generally determined by 
the amoimt of available data for processing. Tjrpical values are N = 20, T = 

10 1024 samples and K = 5. 

The inventive method computes a multipath channel W (i.e., the tap 
values of a multidimensional FIR filter) that simultaneously satisfies equation 7 
for K estimation periods, e.g., 2 to 5 estimation periods for processing audio 
signals. Such a process is performed at steps 214, 216, 218 (collectively a filter 

15 parameter estimation process 224) and is represented using a least squares 
estimation procedure as follows: 

. . T K 2 

w,A, A„ = argmin2SII^(*^v)|| 

wTa^jv. v=I k=l 

W{t)=0,T>Q 
IVi^(v)=l 

where (8) 
E(k, V) = W( v)(^, (^, V) - A„ (A:, v)) W'' (v) - A^(k, v) 

For simplicity, a short form nomenclatiu-e has been used in equation 8, where 
A,(/:,v) = A,(/^,v) and A, = A,(/,, v),©, A^(rj^,v)and the same simplified notation 

20 also applies to A/t,v) and R/t,v). 

To produce the parameters W, a gradient descent process 224 (containing 
steps 214, 216, 218, and 220) is used that iterates the values of W as cost 
function (8) is minimized. In step 216, the W values are updated as W^'^=W"*'*- 
(xV^E, where V^E is the gradient step value and p. is a weighting constant that 

25 controls the size of the update. 

More specifically, the gradient descent process determines the gradient values 
as follows. 
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=X^(Jt,v)W(v)fi"(it,v) (10) 



6E 
6E 



= -diag{E(Jc,vy) (ID 
= -diag{W"{v)E{k,v)W{vy) (12) 



Bik,v) = R,(k,v)-A„(k,v) (13) 
5 With equation 11 equal to zero, the routine can solve explicitly for 

parameters A, (;fe,i;), while parameters A„ (k.v) andW(i;) are computed with a 
gradient descent rule, e.g., new values of A, (k,v) and Wiv) are computed with 
each pass through the routine until the new values of W(v) are not very different 
from the old values o{W(v),Le., W is converged. 
10 Note that equation 8 contains an additional constraint on the filter size in 

the time domain. Up to that constraint it would seem tiie various fi-equendes v 
= 1,...,T represent independent problems. The solutions Wiv) however are 
restricted to those filters that have no time response beyond x>Q «T. 
Effectively, the routine parameterizes Td/i^ filter coefficients in Wiv) with Qd,d^ 
15 parameters Wix). In practice, the values of W are produced in the fi-equency 
domain, at step 214, e.g., W(v), then, at step 218, an FFT is performed on these 
fi-equency domain vsdues to convert the values of W(v) to the time domain, e.g., 
W(t). In the time domain, any W value that appears at a time greater than a 
time Q is set to zero and all values in the range below Q are not adjusted. The 
20 adjusted time domain values of are then converted using an inverse FFT back to 
the frequency domain. By zeroing the filter response in the time domain for all 
time greater than Q, the firequency response of the filter is smoothed such that a 
unique solution at each firequency is readily determined. 

FIG. 3 depicts an illustration of two fi-equency responses 302A and 302B 
25 and FIG. 4 depicts an illustration of their corresponding time-domain responses 
304A and 304B. The least squares solutions for the coefficients are found using 
a gradient descent process performed at step 224 such that an iterative 
approach is used to determine the correct values of W. Once the gradient in 
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equation 10 becomes "flat" as identified in step 220, the routine, at step 222, 
applies the computed filter coefficients to an FIR filter. The FIR filter is used to 
filter the samples of the input (mixed) signal x(t) in the time period KNT in 
length. The FIR filter generates, at step 226, the decorrelated component 
5 signals of the mixed signal. Then, the routine at step 228 gets the next KNT 
nimiber of samples for processing and proceeds to step 200 to filter the next 
KNT samples. The previous KNT samples sire removed firom memory. 

As mentioned above, the gradient equations are constrained to remain in 
the subspace of permissible solutions with W(x) = 0 for x> Q «T. This is 
10 important since it is a necessary condition for equation 8 to achieve a good 
approximation. 

In practical applications, such as voice recognition signal preprocessing, 
the inventive routine substantially enhances the performance of the voice 
recognition accuracy, i.e., word error rates improve by 5 to 50% and in some 

15 instances approach the error rate tHat is achieved when no noise is present. 
Error rate improvement has been shown when a desired voice signal is 
combined with either music or another voice signal as background noise. 

Although various embodiments which incorporate the teachings of the 
present invention have been shown and described in detail herein, those skilled 

20 in the art can readily devise many other varied embodiments that still 
incorporate these teachings. 
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1. A method for separating a mixed signal into a plurality of component signals 
comprising the steps of: 

5 (a) producing a plurality of discrete Fourier transform (DFT) values, 

where a DFT value is produced for every T samples of said mixed signal; 

(b) producing a cross correlation estimation matrix that is averaged over 
N of the DFT values; 

(c) repeating step (b) K times to produce a pliiraUty of cross correlation 
10 estimation values; 

(d) computing, using a gradient descent process, a pluraUty of filter 
coefficients for a finite impulse response (FIR) filter using the cross coirelation 
estimation values; and 

(e) filtering the mixed signal using the FIR filter having the computed 
15 filter coefficients to separate the mixed signal into the plurality of component 

signals. - ' 

2. The method of claim 1 wherein step (d) further comprising the steps of: 

(dl) transforming the filter coefficients fi-om the fi:'equency domain into 
20 the time domain; 

(d2) zeroing any filter coefficients having a value other than zero for any 
time that is greater than a predefined time Q to produce adjusted time domain 
filter coefficients, where Q is less than T; and 

(d3) transforming the adjusted time domain filter coefficients from the 
25 time domain into the frequency domain to produce the filter coefficients used in 
the gradient descent process, 

3. The method of claim 1 wherein the cross correlation estimation values are 
produced in step (b) using the following equation: 

30 R^it.v) = — X + ^T.v)z'' {t + nT.v) 

where %(t,v) is the mixed signal in the frequency domain. 

11 
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4. The method of claim 3 wherein the gradient descent process minimizes the 
function: 



where E(k, v) is an error function. 

5 5. Apparatus for separating a mixed signal into a plurality of component signals 
comprising: 

means for producing a pluraUty of discrete Fourier transform (DFT) 
values, where a DFT value is produced for every T samples of said mixed signal; 

means for producing a plurality of cross correlation estimation values 
10 using the DFT values; 

a gradient descent processor for computing a pluraUty of filter coefficients 
for a finite impulse response (FIR) filter using the cross correlation estimation 
values; and ■ 

a filter for filtering the mixed signal using the FIR filter having the 
15 computed filter coefficients to separate the mixed signal into the plurality of 
component signals. 

6. The apparatus of claim 5 wherein the gradient descent processor fiarther 
comprises: 

20 a first transformer for transforming the filter coefficients firom the 

fi-equency domain into the time domain; 

means for zeroing any filter coefficients having a value other than zero for 

any time that is greater than a predefined time Q to produce adjusted time 

domain filter coefficients, where Q is less than T; and 
25 a second transformer for transforming the adjusted time domain filter 

coefficients fi-om the time domain into the fi-equency domain to produce the 

filter coefficients used in the gradient descent process. 
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7. The apparatus of claim 5 wherein the cross correlation estimation values are 
produced using the following equation: 

12 
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where %(t,v) is the mixed signal in the frequency domain. 

8. The apparatvis of claim 5 further comprising a voice recognition system for 
5 processing at least one of the plurality of component signals. 

9. The apparatus of claim 5 further comprising a plurality of microphones that 
produce the mixed signal. 

10 10. The apparatus of claim 5 wherein the gradient descent processor minimizes 
the function: 

iiii£(^,v)if 

where E(k,v) is an error function, 

11. A computer readable storage medium containing a program that, when 
15 executed upon a general purpose computer system causes the general purpose 
computer system to become a specific purpose computer system that performs a 
method for separating a mixed signal into a plturaUty of component signals 
comprising the steps of: 

(a) producing a plurality of discrete Fourier transform (DFT) values, 
20 where a DFT value is produced for every T samples of said mixed signal; 

(b) producing a cross correlation estimation matrix that is averaged over 
N of the DFT values; 

(c) repeating step (b) K times to produce a plurality of cross correlation 
estimation values; 

25 (d) computing, using a gradient descent process, a plurality of filter 

coefficients for a finite impulse response (FIR) filter using the cross correlation 
estimation values; and 

13 
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(e) filtering the mixed signal losing the FER filter having the computed 
filter coefficients to separate the mixed signal into the plurality of component 
signals. 

5 12. The computer readable medium of claim 11 wherein step (d) of the method 
further comprises the steps of: 

(dl) transforming the filter coefficients fi-om the fi:-equency domain into 
the time domain; 

(d2) zeroing any filter coefficients having a value other than zero for any 
10 time that is greater than a predefined time Q to produce adjusted time domain 
filter coefficients, where Q is less than T; and 

(d3) transforming the adjusted time domain filter coefficients firom the 
time domain into the fi:equency domain to produce the filter coefficients used in 
the gradient descent process. 

15 

13. The computer readable mediiun of claim 11 wherein the cross correlation 
estimation values are produced in step (b) using the following equation: 

ki^. V) = — 2 ;ir(^ + v)x' (t + nT, V) 

where x(t,v) is the mixed signal in the firequency domain 

20 

14. The computer readable mediiun of claim 11 wherein the gradient descent 
process minimizes the function: 

where E(k,v) is an error fimction. 
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